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THERMAL TOLERANT EXOGLUCANASE FROM ACIDOTHERMUS 

CELLULOLYTICUS 

Government Interests 

5 The United States Government has rights in this invention under Contract No. DE-AC36- 

99G0 10337 between the United States Department of Energy and the National Renewable Energy 
Laboratory, a Division of the Midwest Research Institute. 

Field of the Invention 

The invention generally relates to a novel exoglucanase from Acidothermus cellulolyticus, Guxl. 
More specifically, the invention relates to purified and isolated Guxl polypeptides, nucleic acid 
molecules encoding the polypeptides, and processes for production and use of Guxl, as well as 
variants and derivatives thereof. 

Back£round of the Invention 

Plant biomass as a source of energy production can include agricultural and forestry products, 
associated by-products and waste, municipal solid waste, and industrial waste. In addition, over 50 
million acres in the United States are currently available for biomass production, and there are a 
number of terrestrial and aquatic crops grown solely as a source for biomass (A Wiselogel, et al. 
Biomass feedstocks resources and composition. In CE Wyman, ed. Handbook on Bioethanol: 
Production and Utilization. Washington, DC: Taylor & Francis, 1996, pp 105-118). Biofiiels 
produced from biomass include ethanol, methanol, biodiesel, and additives for reformulated 
gasoline. Biofiiels are desirable because they add little, if any, net carbon dioxide to the atmosphere 
and because they greatly reduce ozone formation and carbon monoxide emissions as compared to the 
environmental output of conventional fuels. (P Bergeron. Environmental impacts of bioethanol. In 
CE Wyman, ed. Handbook on Bioethanol: Production and Utilization. Washington, DC: Taylor & 
Francis, 1996, pp 90-103). 

Plant biomass is the most abundant source of carbohydrate in the world due to the lignocellulosic 
30 materials composing the cell walls of all higher plants. Plant cell walls are divided into two sections, 
the primary and the secondary cell walls. The primary cell wall, which provides structure for 
expanding cells (and hence changes as the cell grows), is composed of three major polysaccharides 

-1- 





NREL 01-38 



and one group of glycoproteins. The predominant polysaccharide, and most abundant source of 
carbohydrates, is cellulose, while hemicellulose and pectin are also found in abundance. Cellulose is 
a linear beta-(l,4)-D-glucan and comprises 20% to 30% of the primary cell wall by weight. The 
secondary cell wall, which is produced after the cell has completed growing, also contains 
5 polysaccharides and is strengthened through polymeric lignin covalently cross-linked to 
hemicellulose. 

Carbohydrates, and cellulose in particular can be converted to sugars by well-known methods including 
acid and enzymatic hydrolysis. Enzymatic hydrolysis of cellulose requires the processing of biomass to 

10 reduce size and facilitate subsequent handling. Mild acid treatment is then used to hydrolyze part or all 
of the hemicellulose content of the feedstock. Finally, cellulose is converted to ethanol through the 
concerted action of cellulases and saccharolytic fermentation (simultaneous saccharification 

^3 fermentation (SSF)). The SSF process, using the yeast Saccharomyces cerevisiae for example, is 
often incomplete, as it does not utilize the entire sugar content of the plant biomass, namely the 

is hemicellulose fraction. 

y 

Co 

=,p The cost of producing ethanol from biomass can be divided into three areas of expenditure: 

f-i pretreatment costs, fermentation costs, and other costs. Pretreatment costs include biomass milling, 

y pretreatment reagents, equipment maintenance, power and water, and waste neutralization and 

fu 

^ disposal. The fermentation costs can include enzymes, nutrient supplements, yeast, maintenance and 
scale-up, and waste disposal. Other costs include biomass purchase, transportation and storage, plant 
labor, plant utilities, ethanol distillation, and administration (which may include technology-use 
licenses). One of the major expenses incurred in SSF is the cost of the enzymes, as about one 
kilogram of cellulase is required to fiiUy digest 50 kilograms of cellulose. Economical production of 

25 cellulase is also compounded by factors such as the relatively slow gowth rates of cellulase- 
producing organisms, levels of cellulase expression, and the tendency of enzyme-dependent 
processes to partially or completely inactivate enzymes due to conditions such as elevated 
temperature, acidity, proteolytic degradation, and solvent degradation. 

30 Enzymatic degradation of cellulose requires the coordinate action of at least three different types of 
cellulases. Such enzymes are given an Enzyme Commission (EC) designation according to the 
Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (Eur. J. 
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Biochem. 264: 607-609 and 610-650, 1999). Endo- beta-(l,4)-glucanases (EC 3.2.1.4) cleave the 
cellulose strand randomly along its length, thus generating new chain ends. Exo- beta-(l,4)- 
glucanases (EC 3.2.1.91) are processive enzymes and cleave cellobiosyl units (beta-(l,4)-glucose 
dimers) from free ends of cellulose strands. Lastly, beta-D-glucosidases (cellobiases: EC 3.2.1.21) 
hydrolyze cellobiose to glucose. All three of these general activities are required for efficient and 
complete hydrolysis of a polymer such as cellulose to a subunit, such as the simple sugar, glucose. 

Highly thermostable enzymes have been isolated from the cellulolytic thermophile Acidothermus 
cellulolyticus gen. nov., sp, nov,, a bacterium originally isolated from decaying w^ood in an acidic, 
thermal pool at Yellowstone National Park. A. Mohagheghi et al., (1986) Int. J. Svstematic 
Bacteriology, 36(3): 435-443. One cellulase enzyme produced by this organism, the endoglucanase EI, 
is knovm to display maximal activity at 75 °C to 83''C. M.P. Tucker et al. (1989), Bio/Technology, 
7(8): 817-820. El endoglucanase has been described in U.S. Patent 5,275,944. The A, cellulolyticus 
El endoglucanase is an active cellulase; in combination with the exocellulase CBH I from 
Trichoderma reesei. El gives a high level of saccharification and contributes to a degree of 
synergism. Baker JO et al. (1994), Appl. Biochem. BiotechnoL , 45/46: 245-256. The gene coding EI 
catalytic and carbohydrate binding domains and linker peptide were described in U.S. Patent 5,536,655. 
El has also been expressed as a stable, active enzyme from a wide variety of hosts, including coli, 
Streptomyces lividans, Pichia pastoris, cotton, tobacco, and Arabidopsis (Dai Z, Hooker BS, 
Anderson DB, Thomas SR. Transgenic Res. 2000 Feb; 9(l):43-54). 



The potential exists for the successfiil, commercial-scale expression of heterologous cellulases, and 
in particular novel cellulases v?aftK6r without any one or more desirable properties such as thermal 
tolerance and resistance to aei^ inactivation, proteolytic inactivation, and solvent inactivation. Such 
expression can occur-in filamentous fiingi, bacteria, and other hosts. 

There is a need within the art to generate altemative cellulase enzymes capable of assisting in the 
commercial-scale processing of cellulose to sugar for use in biofiiel production. Against this 
backdrop the present invention has been developed. The potential exists for the successful, 
commercial-scale expression of heterologous cellulase polypeptides, and in particular novel cellulase 
polypeptides with or without any one or more desirable properties such as thermal tolerance, and 
partial or complete resistance to extreme pH inactivation, proteolytic inactivation, solvent 
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inactivation, chaotropic agent inactivation, oxidizing agent inactivation, and detergent inactivation. 
Such expression can occur in fungi, bacteria, and other hosts. 



5 The present invention provides Guxl, a novel member of the glycoside hydrolase (GH) family of 
enzymes, and in particular a thermal tolerant glycoside hydrolase useful in the degradation of 
cellulose. Guxl polypeptides of the invention include those having an amino acid sequence shown 
in SEQ E) N0:1, as well as polypeptides having substantial amino acid sequence identity to the 
amino acid sequence of SEQ ID N0:1 and useful fragments thereof, including, a catalytic domain 

10 having significant sequence similarity to the GH48 family, a first carbohydrate binding domain (type 
n) and a second carbohydrate binding domain (type IH). 

C3 The invention also provides a polynucleotide molecule encoding Guxl polypeptides and fragments 

v3 

Lp of Guxl polypeptides, for example catalytic and carbohydrate binding domains. Polynucleotide 

u 

molecules of the invention include those molecules having a nucleic acid sequence as shown in SEQ 
|y ID N0:2; those that hybridize to the nucleic acid sequence of SEQ ID N0:2 under high stringency 

Co 

£ conditions; and those having substantial nucleic acid identity with the nucleic acid sequence of SEQ 



|§ The invention includes variants and derivatives of the Guxl polypeptides, including fusion proteins. 

T2 For example, fusion proteins of the invention include Guxl polypeptide fused to a heterologous 
protein or peptide that confers a desired function. The heterologous protein or peptide can facilitate 
purification, oHgomerization, stabihzation, or secretion of the Guxl polypeptide, for example. As 
further examples, the heterologous polypeptide can provide enhanced activity, including catalytic or 

25 binding activity, for Guxl polypeptides, where the enhancement is either additive or synergistic. A 
fusion protein of an embodiment of the invention can be produced, for example, from an expression 
construct containing a polynucleotide molecule encoding Guxl polypeptide in frame with a 
polynucleotide molecule for the heterologous protein. Embodiments of the invention also comprise 
vectors, plasmids, expression systems, host cells, and the like, containing a Guxl polynucleotide 

30 molecule. Genetic engineering methods for the production of Guxl polypeptides of embodiments of 
the invention include expression of a polynucleotide molecule in cell free expression systems and in 
cellular hosts, according to known methods. 



Summary of the Invention 



3 



ID N0:2. 
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The invention further includes compositions containing a substantially purified Guxl polypeptide of 
the invention and a carrier. Such compositions are administered to a biomass containing cellulose 
for the reduction or degradation of the cellulose. 

The invention also provides reagents, compositions, and methods that are useful for analysis of Guxl 
activity. 

These and various other features as well as advantages which characterize the present invention will 
be apparent from a reading of the following detailed description and a review of the associated 
drawings. 

The following Tables 4 and 5 includes sequences used in describing embodiments of the present 
invention, hi Table 4, the abbreviations are as follows: CD, catalytic domain; CBD_n, carbohydrate 
binding domain type H; CBD_in, carbohydrate binding domain type HI; and FN-IQ, fibronectin 
domain type HI. When used herein, N* indicates a string of unknown nucleic acid units, and X* 
indicates a string of unknown amino acid imits, for example about 50 or more. Table 4 includes 
approximate start and stop information for segments, and Table 5 includes amino acid sequence data 
for segments. 
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Brief Description of the Drawings 

FIG. 1 is a schematic representation of the gene sequence and amino acid segment organization. 

5 FIG 2 is a graphic representation of the glycoside hydrolase gene/protein families found in various 
organisms. 

Detailed Description 

Definitions: 

10 The following definitions are provided to facilitate understanding of certain terms used fi-equently 
herein and are not meant to limit the scope of the present disclosure: 

"Amino acid" refers to any of the twenty naturally occuring amino acids as well as any modified 
amino acid sequences. Modifications may include natural processes such as posttranslational 
processing, or may include chemical modifications which are known in the art. Modifications 
include but are not limited to: phosphorylation, ubiquitination, acetylation, amidation, glycosylation, 
covalent attachment of flavin, ADP-ribosylation, cross linking, iodination, methylation, and alike. 

"Antibody" refers to a Y-shaped molecule having a pair of antigen binding sites, a hinge region and a 
constant region. Fragments of antibodies, for example an antigen binding fi-agment (Fab), chimeric 
antibodies, antibodies having a human constant region coupled to a murine antigen binding region, 
and fi-agments thereof, as well as other well known recombinant antibodies are included in the 
present invention. 

25 "Antisense" refers to polynucleotide sequences that are complementary to target "sense" 
polynucleotide sequence. 

"Binding activity" refers to any activity that can be assayed by characterizing the ability of a 
polypeptide to bind to a substrate. The substrate can be a polymer such as cellulose or can be a 
30 complex molecule or- aggregate of molecules where the entire moiety comprises at least some 
cellulose. 
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"Cellulose activity" refers to any activity that can be assayed by characterizing the enzymatic activity 
of a cellulase. For example, cellulase activity can be assayed by determining how much reducing 
sugar is produced during a fixed amount of time for a set amount of enzyme (see Irwin et al., (1998) 
J. Bacteriology, 1709-1714). Other assays are well known in the art and can be substituted. 

5 

"Complementary" or "complementarity" refers to the ability of a polynucleotide in a polynucleotide 
molecule to form a base pair with another polynucleotide in a second polynucleotide molecule. For 
example, the sequence A-G-T is complementary to the sequence T-C-A. Complementarity may be 
partial, in which only some of the polynucleotides match according to base pairing, or complete, 
10 where all the polynucleotides match according to base pairing. 




u 



S , £ 

US 



"Expression" refers to transcription and translation occurring within a host cell. The level of 
expression of a DNA molecule in a host cell may be determined on the basis of either the amount of 
corresponding mRNA that is present within the cell or the amount of DNA molecule encoded protein 
produced by the host cell (Sambrook et al., 1989, Molecular cloning: A Laboratory Manual, 18.1- 
18.88). 




25 



30 



"Fusion protein" \efers to a first protein having attached a second, heterologous protein. Preferably, 
the heterologous p^^tein is fused via recombinant DNA techniques, such that the first and second 
proteins are expressedan firame. The heterologous protein can confer a desired characteristic to the 
fusion protein, for exanM)le, a detection signal, enhanced stability or stabilization of the protein, 
facilitated oligomerizationVf the protein, or facilitated purification of the fusion protein. Examples 
of heterologous proteins useral in the fusion proteins of the invention include molecules having one 
or more catalytic domains of Quxl, one or more binding domains of Guxl, one or more catalytic 
domains of a glycoside hydrolase, other than Guxl, one or more binding domains of a glycoside 
hydrolase other than Guxl, or any aombination thereof. Further examples include immunoglobulin 
molecules and portions thereof, peptiae tags such as histidine tag (6-His), leucine zipper, substrate 
targeting moieties, signal peptides, anoVthe like. Fusion proteins are also meant to encompass 
variants and derivatives of Guxl polypratides that are generated by conventional site-directed 
mutagenesis and more modem techniques suo^i as directed evolution, discussed infi-a. 
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"Genetically engineered" refers to any recombinant DNA or RNA method used to create a 
prokaryotic or eukaryotic host cell that expresses a protein at elevated levels, at lowered levels, or in 
a mutated form, hi other words, the host cell has been transfected, transformed, or transduced with a 
recombinant polynucleotide molecule, and thereby been altered so as to cause the cell to alter 

5 expression of the desired protein. Methods and vectors for genetically engineering host cells are well 
known in the art; for example various techniques are illustrated in Current Protocols in Molecular 
Biology, Ausubel et al., eds. (Wiley & Sons, New York, 1988, and quarterly updates). Genetically 
engineering techniques include but are not limited to expression vectors, targeted homologous 
recombination and gene activation (see, for example, U.S. Patent No. 5,272,071 to Chappel) and 

10 trans activation by engineered transcription factors (see, for example, Segal et al., 1999, Proc Natl 
Acad Sci USA 96(6):2758-63). 



□ "Glycoside hydrolase family" refers to a family of enzymes which hydrolyze the glycosidic bond 

In 

between two or more carbohydrates or between a carbohydrate and a non-carbohydrate moiety 
¥5 (Henrissat B., (1991) Biochem. J., 280:309-316). Identification of a putative glycoside hydrolase 
hj family member is made based on an amino acid sequence comparison and the finding of significant 
"p sequence similarity within the putative member's catalytic domain, as compared to the catalytic 

domains of known family members. 

C3 
m 



C3 



"Homology" refers to a degree of complementarity between polynucleotides, having significant 
effect on the efficiency and strength of hybridization between polynucleotide molecules. The term 
also can refer to a degree of similarity between polypeptides. Two polypeptides having greater than 
or equal to about 60% similarity are presumptively homologous. 



25 "Host cell" or "host cells" refers to cells expressing a heterologous polynucleotide molecule. Host 
cells of the present invention express polynucleotides encoding Guxl or a fragment thereof 
Examples of suitable host cells usefiil in the present invention include, but are not limited to, 
prokaryotic and eukaryotic cells. Specific examples of such cells include bacteria of the genera 
Escherichia, Bacillus, and Salmonella, as well as members of the genera Pseudomonas, Streptomyces, 

30 and Staphylococcus'^ fimgi, particularly filamentous fimgi such as Trichoderma and Aspergillus, 
Phanerochaete chrysosporium and other white rot fimgi; also other fimgi including Fusaria, molds, 
and yeast including Saccharomyces sp., Pichia sp., and Candida sp. and the like; plants e.g. 
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Arabidopsis, cotton, barley, tobacco, potato, and aquatic plants and the like; SF9 insect cells (Summers 
and Smith, 1987, Texas Agriculture Experiment Station Bulletin, 1555), and the like. Other specific 
examples include mammalian cells such as human embyonic kidney cells (293 cells), Chinese 
hamster ovary (CHO) cells (Puck et al., 1958, Proc. Natl Acad, Sci. USA 60, 1275-1281), human 
5 cervical carcinoma cells (HELA) (ATCC CCL 2), human liver cells (Hep G2) (ATCC HB8065), 
human breast cancer cells (MCF-7) (ATCC HTB22), human colon carcinoma cells (DLD-1) (ATCC 
CCL 221), Daudi cells (ATCC CRL-213), murine myeloma cells such as P3/NSI/l-Ag4-l (ATCC 
TIB-18), P3X63Ag8 (ATCC TIB-9), SP2/0-Agl4 (ATCC CRL-1581) and the like. 



10 "Hybridization" refers to the pairing of complementary polynucleotides during an annealing period. 
The strength of hybridization between two polynucleotide molecules is impacted by the homology 
between the two molecules, stringency of the conditions involved, the melting temperature of the 

P formed hybrid and the G:C ratio within the polynucleotides. 

fS "Identity" refers to a comparison between pairs of nucleic acid or amino acid molecules. Methods 

id for determining sequence identity are known. See, for example, computer programs commonly 

rg 

employed for this purpose, such as the Gap program (Wisconsin Sequence Analysis Package, Version 8 
I for Unix, Genetics Computer Group, University Research Park, Madison Wisconsin), that uses the 
-J J algorithm of Smith and Wateraian, 1981, Adv, Appl. Math., 2: 482-489. 

•a™ 

"Isolated" refers to a polynucleotide or polypeptide that has been separated from at least one 
contaminant (polynucleotide or polypeptide) with which it is normally associated. For example, an 
isolated polynucleotide or polypeptide is in a context or in a form that is different from that in which 
it is found in nature. 

25 

"Nucleic acid sequence" refers to the order or sequence of deoxyribonucleotides along a strand of 
deoxyribonucleic acid. The order of these deoxyribonucleotides determines the order of amino acids 
along a polypeptide chain. The deoxyribonucleotide sequence thus codes for the amino acid 
sequence. 

30 

"Polynucleotide" refers to a linear sequence of nucleotides. The nucleotides may be ribonucleotides, 
or deoxyribonucleotides, or a mixture of both. Examples of polynucleotides in the context of the 
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present invention include single and double stranded DNA, single and double stranded RNA, and 
hybrid molecules having mixtures of single and double stranded DNA and RNA. The 
polynucleotides of the present invention may contain one or more modified nucleotides. 

5 "Protein," "peptide," and "polypeptide" are used interchangeably to denote an amino acid polymer or 
a set of two or more interacting or bound amino acid polymers. 

"Purify," or "purified" refers to a target protein that is fi-ee fi-om at least 5-10% of contaminating 
proteins. Purification of a protein from contaminating proteins can be accomplished using knovra 
techniques, including ammonium sulfate or ethanol precipitation, acid precipitation, heat precipitation, 
anion or cation exchange chromatography, phosphocellulose chromatography, hydrophobic interaction 
chromatography, affinity chromatography, hydroxylapatite chromatography, size-exclusion 
chromatography, and lectin chromatography. Various protein purification techniques are illustrated in 
Current Protocols in Molecular Biology, Ausubel et al., eds. (Wiley & Sons, New York, 1988, and 
quarterly updates). 

"Selectable marker" refers to a marker that identifies a cell as having imdergone a recombinant DNA 
or RNA event. Selectable markers include, for example, genes that encode antimetabolite resistance 
such as the DHFR protein that confers resistance to methotrexate (Wigler et al, 1980, Proc Natl Acad 
Sci USA 77:3567; 0*Hare et al., 1981, Proc Natl Acad Sci USA, 78:1527), the GPT protein that 
confers resistance to mycophenolic acid (Mulligan & Berg, 1981, PNAS USA, 78:2072), the 
neomycin resistance marker that confers resistance to the aminoglycoside G-418 (Calberre-Garapin 
et al., 1981, JMol Biol, 150:1), the Hygro protein that confers resistance to hygromycin (Santerre et 
al., 1984, Gene 30:147), and the Zeocin™ resistance marker (Invitrogen). In addition, the herpes 
simplex virus thymidine kinase, hypoxanthine-guanine phosphoribosyltransferase and adenine 
phosphoribosyltransferase genes can be employed in tk', hgprt' and aprt" cells, respectively. 

"Stringency" refers to the conditions (temperature, ionic strength, solvents, etc) under which 
hybridization between polynucleotides occurs. A hybridzation reaction conducted under high 
30 stringency conditions is one that will only occur between polynucleotide molecules that have a high 
degree of complementary base pairing (85% to 100% identity). Conditions for high stringency 
hybridization, for example, may include an overnight incubation at about 42°C for about 2.5 hours in 6 
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X SSC/0.1% SDS, followed by washing of the filters in 1.0 X SSC at eS^'C, 0.1% SDS. A 
hybridization reaction conducted under moderate stringency conditions is one that will occur 
between polynucleotide molecules that have an intermediate degree of complementary base pairing 
(50% to 84% identity). 

5 

"Substrate targeting moiety" refers to any signal on a substrate, either naturally occurring or 
genetically engineered, used to target any Guxl polypeptide or fi-agment thereof to a substrate. Such 
targeting moieties include ligands that bind to a substrate structure. Examples of ligand/receptor 
pairs include carbohydrate binding domains and cellulose. Many such substrate-specific ligands are 
10 known and are useful in the present invention to target a Guxl polypeptide or fi"agment thereof to a 
substrate. A novel example is a Guxl carbohydrate binding domain that is used to tether other 
molecules to a cellulose-containing substrate such as a fabric. 




"Themaal tolerant" refers to the property of ^roistanding partial or complete inactivation by heat and 
can also be described as thermal resist^ilce or thermal stabihty. Although some variation exists in 
the literature, the following definitiwis can be considered typical for the optimum temperature range 
of stability and activity fop^-^izymes: psycrophiHc (below freezing to IOC); mesophilic (10°C to 
50°C); thermophilic (5trC to 75°C); and caldophilic (75°C to above boiling water temperature). 
^4 The stability and catalytic activity of enzymes are linked characteristics, and the ways of measuring 
these properti^ vary considerably. For industrial enzymes, stability and activity are best measured 
under use conditions, often in the presence of substrate. Therefore, cellulases that must act on 
process streams of cellulose must be able to withstand exposure up to thermophilic or even 
/Caldophilic temperatures for digestion times in excess of several hours. 

25 In encompassing a wide variety of potential applications for embodiments of the present invention, 
thermal tolerance refers to the ability to function in a temperature range of fi"om about 15°C to about 
100°C. A preferred range is fi-om about 30°C to about 80°C. A highly preferred range is firom about 
50°C to about 70°C. For example, a protein that can function at about 45°C is considered in the 
preferred range even though it may be susceptible to partial or complete inactivation at temperatures 

30 in a range above about 45°C and less than about 80°C. For polypeptides derived fi'om organisms 
such as Acidothermus, the desirable property of thermal tolerance among is often accompanied by 
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other desirable characteristics such as: resistance to extreme pH degradation, resistance to solvent 
degradation, resistance to proteolytic degradation, resistance to detergent degradation, resistance to 
oxidizing agent degradation, resistance to chaotropic agent degradation, and resistance to general 
degradation. Cowan DA in Danson MJ et al. (1992) The Archaebacteria, Biochemistry and 
5 Biotechnology at 149-159, University Press, Cambridge, ISBN 1855780100. Here 'resistance' is 
intended to include any partial or complete level of residual activity. When a polypeptide is 
described as thermal tolerant it is understood that any one, more than one, or none of these other 
desirable properties can be present. 

"Variant", as used herein, means a polynucleotide or polypeptide molecule that differs from a 
reference molecule. Variants can include nucleotide changes that result in amino acid substitutions, 
deletions, fusions, or truncations in the resulting variant polypeptide when compared to the reference 
polypeptide. 

"Vector," "extra-chromosomal vector" or "expression vector" refers to a first polynucleotide 
molecule, usually double-stranded, which may have inserted into it a second polynucleotide 
molecule, for example a foreign or heterologous polynucleotide. The heterologous polynucleotide 
molecule may or may not be naturally foimd in the host cell, and may be, for example, one or more 
additional copy of the heterologous polynucleotide naturally present in the host genome. The vector 
is adapted for transporting the foreign polynucleotide molecule into a suitable host cell. Once in the 
host cell, the vector may be capable of integrating into the host cell chromosomes. The vector may 
optionally contain additional elements for selecting cells containing the integrated polynucleotide 
molecule as well as elements to promote transcription of mRNA from transfected DNA. Examples 
of vectors useful in the methods of the present invention include, but are not limited to, plasmids, 
bacteriophages, cosmids, retroviruses, and artificial chromosomes. 

Within the appUcation, unless otherwise stated, the techniques utilized may be foimd in any of 
several well-known references, such as: Molecular Cloning: A Laboratory Manual (Sambrook et al. 
(1989) Molecular cloning: A Laboratory Manual), Gene Expression Technology (Methods in 
30 Enzymology, Vol. 185, edited by D. Goeddel, 1991 Academic Press, San Diego, CA), "Guide to 
Protein Purification" in Methods in Enzymology (M.P. Deutshcer, 3d., (1990) Academic Press, Inc.), 
PCR Protocols: A Guide to Methods and Applications (Innis et al. (1990) Academic Press, San 
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Diego, CA), Culture of Animal Cells: A Manual of Basic Technique, 2""* ed. (R.L Freshney (1987) 
Liss, Inc., New York, NY), and Gene Transfer and Expression Protocols, pp 109-128, ed. EJ. 
Murray, The Humana Press Inc., Clifton, N.J.). 

O-Glycoside Hydrolases: 

Glycoside hydrolases are a large and diverse family of enzymes that hydrolyse the glycosidic bond 
between two carbohydrate moieties or between a carbohydrate and a non-carbohydrate moiety (See 
FIG. 2). Glycoside hydrolase enzymes are classified into glycoside hydrolase (GH) families based 
on significant amino acid similarities within their catalytic domains. Enzymes having related 
catalytic domains are grouped together within a family, (Henrissat et al., (1991) supra, and Henrissat 
et al. (1996), Biochem. J. 316:695-696), where the underlying classification provides a direct 
relationship between the GH domain amino acid sequence and how a GH domain will fold. This 
information ultimately provides a common mechanism for how the enzyme will hydrolyse the 
glycosidic bond within a substrate, /.e., either by a retaining mechanism or inverting mechanism 
(Henrissat., B, (1991) supra). 



Cellulases belong to the GH family of enzymes. Cellulases are produced by a variety of bacteria and 
fungi to degrade the p-1,4 glycosidic bond of^cdlulose and to so produce successively smaller 
fi-agments of cellulose and ultimately produce glucose. At present, cellulases are foxmd within are at 
least 11 different GH families. Thre^^ifferent types of cellulase enzyme activities have been 
identified within these GH families: exo-acting cellulases which cleave successive disaccharide units 
fi-om the non-reducing^nds of a cellulose chain; endo-acting cellulases which randomly cleave 
successive disaccharide units within the cellulose chain; and p-glucosidases which cleave successive 
disaccharide^umts^to glucose (J. W. Deacon, (1997) Modem Mycology, 3rd Ed., ISBN: 0-632- 
03077^1<97-98). 

Many cellulases are characterized by having a multiple domain unit within their overall structure, a 
GH or catalytic domain is joined to a carbohydrate-binding domain (CBD) by a glycosylated linker 
peptide (see FIG. 1) (Koivula et al., (1996) Protein Expression and Purification 8:391-400). As 
noted above, cellulases do not belong to any one family of GH domains, but rather have been 
identified within at least 11 different GH families to date. The CBD type domain increases the 
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concentration of the enzyme on the substrate, in this case cellulose, and the linker peptide provides 
flexibility for both larger domains. 

Conversion of cellulose to glucose is an essential step in the production of ethanol or other biofuels 
from biomass. Cellulases are an important component of this process, where approximately one 
kilogram of cellulase can digest fifty kilograms of cellulose. Within this process, thermostable 
cellulases have taken precedent, due to their ability to function at elevated temperatures and under 
other conditions including pH extremes, solvent presence, detergent presence, proteolysis, etc. (see 
Cowan DA (1992), supra). 

Highly thermostable cellulase enzymes are secreted by the cellulolytic themophile Acidothermus 
cellulolyticus (U.S. Patent Nos. 5,275,944 and 5,110,735). This bacterium was originally isolated 
from decaying wood in an acidic, thermal pool at Yellowstone National Park and deposited with the 
American Type Culture Collection (ATCC 43068) (Mohagheghi et al., (1986) Int /. System. 
Bacteriol, 36:435-443). 

Recently, a thermostable cellulase. El endoglucanase, was identified and characterized from 
Acidothermus cellulolyticus (U.S. Patent No. 5,536,655). The El endoglucanase has maximal 
activity between 75 and 83°C and is active to a pH well below 5. Thermostable cellulase, and El 
endoglucanase, are usefiil in the conversion of biomass to biofiiels, and in particular, are usefiil in the 
conversion of cellulose to glucose. Conversion of biomass to biofuel represents an extremely 
important alternative fiiel source that is more environmentally friendly than conventional fiiels, and 
provides a use, in some cases, for waste products. 

Guxl: 

As described more fiilly in the Examples below, Guxl, a novel thermostable cellulase, has now been 
identified and characterized. The predicted amino acid sequence of Guxl (SEQ ID N0:1) has an 
organization characteristic of a cellulase enzyme. Guxl contains a carbohydrate binding domain - 
linker domain - catalytic domain -linker domain- fibronectin domain - linker domain - carbohydrate 
binding domain unit. In particular, a catalytic domain unit includes a carbohydrate binding domain 
type in (amino acids from about A3 5 to about A 187), a GH48 catalytic domain (amino acids from 
about N231 to about P870), and a CBDn (amino acids from about G1021 to about SI 121). As 
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discussed in more detail below, significant amino acid similarity of Guxl to other cellulases 
identifies Guxl as a cellulase. 




-|^3xAjux1, as noted above, has a catalytic domain, identified^ belonging to the GH48 family. The 
GH48 domain family includes a nimiber of exoglupai^es, for example, firom Cellulomonas fimi, 
and exoglucanase E3 isolated fi'om Thermobifida fusca. The GH48 members degrade substrate 



using an inverting mechanism. Bdng^a>'member of the GH48 family of proteins identifies Guxl as 
potentially having exogluc^iase^activity. In addition, the predicted amino acid sequence (SEQ ID 
NO: 1) indicates that CBD type n and CBD type HI domains are present as characterized by Tomme 
10 P. et al. (1995)j^n Enzymatic Degradation of Insoluble Polysaccharides (Saddler JN & Penner M, 
eds.), at 142-163, American Chemical Society, Washington. See also Tomme, P. & Claeyssens, M. 
pU98^EBS Lett. 243, 239-2431; Gilkes, N.R et al., (1988) J.Biol.Chem. 263, 10401-10407. 

Gxixl is also a thermostable cellulase as it is produced by the themophile Acidothermus 
rS cellulolyticus. As discussed, Guxl polypeptides can have other desirable characteristics (see Cowan 
iJ DA (1992), supra). Like other members of the cellulase family, and in particular thermostable 
cellulases, Guxl polypeptides are usefiil in the conversion of biomass to biofiiels and biofiiel 
additives, and in particular, biofiiels fi-om cellulose. It is envisioned that Guxl polypeptides could be 
used for other purposes, for example in detergents, pulp and paper processing, food and feed 
processing, and in textile processes. Guxl polypeptides can be used alone or in combination with 
one or more other cellulases or glycoside hydrolases to perform the uses described herein or known 
within the relevant art, all of which are within the scope of the present disclosure. 



Guxl Polypeptides: 

25 Guxl polypeptides of the invention include isolated polypeptides having an amino acid sequence as 
shown below in Example 1; Table 1 and in SEQ ID N0:1, as well as variants and derivatives, 
including fragments, having substantial identity to the amino acid sequence of SEQ ID N0:1 and that 
retain any of the functional activities of Guxl. Guxl polypeptide activity can be determined, for 
example, by subjecting the variant, derivative, or fragment to a substrate binding assay or a cellulase 

30 activity assay such as those described in Irwin D et al., J. Bacteriology 180(7): 1709-1714 (April 1998). 
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Table 1. Guxl amino acid sequence (SEQ ID NO:l) 

MPGLRRRLRAGIVSAAALGSLVSGLVAVAPVAHAAVTLKAQYKNNDSAPS 
DNQIKPGLQLVNTGSSSVDLSTVTVRYWFTRDGGSSTLVYNCDWAAMGCG 
NIRASFGSVNPATPTADTYLQLSFTGGTLAAGGSTGEIQNRVNKSDWSNF 
5 DETl^YSYGTNTTFQDWTKVTVYVNGVLVWGTEPSGATASPSASATPSPS 
SSPTTSPSSSPSPSSSPTPTPSSSSPPPSSNDPYIQRFLTMYNKIHDPAN 
GYFSPQGIPYHSVETLIVEAPDYGHETTSEAYSFWLWLEATYGAVTGNWT 
PFNNAWTTMETYMIPQHADQPNNASYNPNSPASYAPEEPLPSMYPVAIDS 
SVPVGHDPLAAELQSTYGTPDIYGMHWLADVDNIYGYGDSPGGGCELGPS 

10 AKGVSYINTFQRGSQESVWETVTQPTCDNGKYGGAHGYVDLFIQGSTPPQ 
WKYTDAPDADARAVQAAYWAYTWASAQGKASAIAPTIAKASQTGDYLRYS 
LFDKYFKQVGNCYPASSCPGATGRQSETYLIGWYYAWGGSSQGWAWRIGD 
GAAHFGYQNPLAAWAMSNVTPLIPLSPTAKSDWAASLQRQLEFYQWLQSA 
EGAIAGGATNSWNGNYGTPPAGDSTFYGMAYDWEPVYHDPPSNNWFGFQA 

1 5 WSMERVAEYYYVTGDPKAKALLDKWVAWVKPNVTTGASWS I PSNLSWSGQ 
PDTWNPSNPGTNANLHVTITSSGQDVGVAAALAKTLEYYAAKSGDTASRD 
LAKGLLDSMWNNDQDSLGVSTPETRTDYSRFTQVYDPTTGDGLYIPSGWT 
GTMPNGDQIKPGATFLSIRSWYTKDPQWSKVQAYLNGGPAPTFNYHRFWA 
ESDFAMANADFGMLFPSGSPSPTPSPTPTSSPSPTPSSSPTPSPSPSPTG 

20 DTTPPSVPTGLQVTGTTTSSVSLSWTASTDNVGVAHYNVYRNGTLVGQPT 
ATS FTDTGIiAAGTS YTYTVAAVDAAGNTS AQS FAGDSDDGI AVAS PS PSP 
TPTSS PS PTPS PTPS PTSTSGASCTATYWNSDWGSGFTTTVTVTNTGTR 
ATSGWTVTWSFAGNQTVTNYWNTALTQSGKSVTAKNLSYNNVIQPGQSTT 
FGFNGSYSGTNTAPTLSCTAS 

y 



K\ As listed and described in Tables 1 and 5, the isolated Guxl polypeptide includes an N-terminal 

r, 3 

'ti hydrophobic region that functions as a signal peptide, having an amino acid sequence that begins with 
^ Metl and extends to about A34; a carbohydrate binding domain having sequence similarity to such type 
□ in domains that begins with about A35 and extends to about A187, a catalytic domain having 

significant sequence similarity to a GH48 family domain that begins with about N231 and extends to 
^0 about P870, a fibronectin type HI domain that begins with about D901 and extends to about G985, a 
1==: carbohydrate binding domain type n region that begins with about G1021 and extends to about SI 121. 
35 Variants and derivatives of Guxl include, for example, Guxl polypeptides modified by covalent or 

aggregative conjugation with other chemical moieties, such as glycosyl groups, polyethylene glycol 

(PEG) groups, lipids, phosphate, acetyl groups, and the like. 



The amino acid sequence of Guxl polypeptides of the invention is preferably at least about 60% 
40 identical, more preferably at least about 70% identical, or in some embodiments at least about 90% 
identical, to the Guxl amino acid sequence shown above in Table 1 and SEQ ID NO:l. The percentage 
identity, also termed homology (see definition above) can be readily determined, for example, by 
comparing the two polypeptide sequences using any of the computer programs conraionly employed for 
this purpose, such as the Gap program (Wisconsin Sequence Analysis Package, Version 8 for Unix, 
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Genetics Computer Group, University Research Park, Madison Wisconsin), which uses the algorithm 
of Smith and Waterman, \9Sl, Adv. Appl Math. 2: 482-489. 

Variants and derivatives of the Guxl polypeptide may further include, for example, fusion proteins 
5 formed of a Guxl polypeptide and a heterologous polypeptide. Preferred heterologous polypeptides 
include those that facilitate purification, oligomerization, stability, or secretion of the Guxl 
polypeptides. 

Guxl polypeptide fragments may include, but are not limited to, the polypeptide sequences listed in 
10 Table 4, SEQ ID NOS: 3, 4, 5, 6, and 7. 

Guxl polypeptide variants and derivatives, as used in the description of the invention, can contain 
conservatively substituted amino acids, meaning that one or more amino acid can be replaced by an 
amino acid that does not alter the secondary and/or tertiary structure of the polypeptide. Such 
substitutions can include the replacement of an amino acid, by a residue having similar 
physicochemical properties, such as substituting one aliphatic residue (He, Val, Leu, or Ala) for 
another, or substitutions between basic residues Lys and Arg, acidic residues Glu and Asp, amide 
residues Gin and Asn, hydroxyl residues Ser and Tyr, or aromatic residues Phe and Tyr. Phenotypically 
silent amino acid exchanges are described more fully in Bowie et al, 1990, Science 2^7:1306-1310. hi 
addition, functional Guxl polypeptide variants include those having amino acid substitutions, deletions, 
or additions to the amino acid sequence outside functional regions of the protein, for example, outside 
the catalytic and carbohydrate binding domains. These would include, for example, the various Unker 
sequences that connect functional domains as defined herein. 

25 The Guxl polypeptides of the present invention are preferably provided in an isolated form, and 
preferably are substantially purified. The polypeptides may be recovered and purified from 
recombinant cell cultures by known methods, including, for example, ammonium sulfate or ethanol 
precipitation, anion or cation exchange chromatography, phosphocellulose chromatography, 
hydrophobic interaction chromatography, affinity chromatography, hydroxylapatite chromatography, 

30 and lectin chromatography. Preferably, high performance liquid chromatography (HPLC) is employed 
for purification. 
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Another preferred form of Guxl polypeptides is that of recombinant polypeptides as expressed by 
suitable hosts. Furthermore, the hosts can simultaneously produce other cellulases such that a mixture 
is produced comprising a Guxl polypeptide and one or more other cellulases. Such a mixture can be 
effective in crude fermentation processing or other industrial processing. 

Guxl polypeptides\an be fused to heterologous polypeptides to facilitate purification. Many available 
heterologous peptides (peptide tags) allow selective binding of the fusion protein to a binding partner. 
Non-limiting examples or^eptide tags include 6-His, thioredoxin, hemaglutinin, GST, and the OmpA 
{j / signal sequence tag. A binding partner that recognizes and binds to the heterologous peptide can be any 
10 molecule or compound, including metal ions (for example, metal affinity columns), antibodies, 
antibody fi^agments, or any proteim^or peptide that preferentially binds the heterologous peptide to 
permit purification of the fusion proteinS 



Guxl polypeptides can be modified to facilitate formation of Guxl oligomers. For example, Guxl 
tS- polypeptides can be fused to peptide moieties that promote oUgomerization, such as leucine zippers and 
certain antibody fi-agment polypeptides, for example, Fc polypeptides. Techniques for preparing these 
fusion proteins are known, and are described, for example, in WO 99/31241 and in Cosman et.al., 2001 
Immunity 14:123-133. Fusion to an Fc polypeptide offers the additional advantage of facihtating 
purification by affinity chromatography over Protein A or Protein G columns. Fusion to a leucine- 



1& zipper (LZ), for example, a repetitive heptad repeat, often with four or five leucine residues interspersed 



with other amino acids, is described in Landschultz et al., 1988, Science, 240:1759. 



It is also envisioned that an expanded set of variants and derivatives of Guxl polynucleotides and/or 
polypeptides can be generated to select for useful molecules, where such expansion is achieved not only 
25 by conventional methods such as site-directed mutagenesis (SDM) but also by more modem 
techniques, either independently or in combination. 



Site-directed-mutagenesis is considered an informational approach to protein engineering and can rely 
on high-resolution crystallographic structures of target proteins and some stratagem for specific amino 
30 acid changes (Van Den Burg, B.; Vriend, G.; Veltman, O.R.; Venema, G.; Eijsink, V.G.H. Proc. Nat. 
Acad. Sci. U.S. 1998, 95, 2056-2060). For example, modification of the amino acid sequence of Guxl 
polypeptides can be accomplished as is known in the art, such as by introducing mutations at particular 
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locations by oligonucleotide-directed mutagenesis (Walder et al.,1986, Gene, 42:133; Bauer et al., 
1985, Gene 37:73; Craik, 1985, BioTechniques, 12-19; Smith et al., 1981, Genetic Engineering: 
Principles and Methods, Plenum Press; and U.S. Patent No. 4,518,584 and U.S. Patent No. 4,737,462). 
SDM technology can also employ the recent advent of computational methods for identifying site- 
5 specific changes for a variety of protein engineering objectives (Hellinga, H.W. Nature Structural. Biol. 
1998, 5, 525-527). 

The more modem techniques include, but are not limited to, non-informational mutagenesis techniques 
(referred to generically as "directed evolution"). Directed evolution, in conjunction with high- 
10 throughput screening, allows testing of statistically meaningful variations in protein conformation 
(Arnold, F.H. Nature Biotechnol. 1998, 16, 617-618). Directed evolution technology can include 
diversification methods similar to that described by Crameri A. et al. (1998, Nature 391: 288-291), site- 
Q saturation mutagenesis, staggered extension process (StEP) (Zhao, H.; Giver, L.; Shao, Z.; Affholter, 
;| J.A.; Amold, F.H. Nature Biotechnol. 1998, 16, 258-262), and DNA synthesis/reassembly (U.S. Patent 
5,965,408). 

Fragments of the Guxl polypeptide can be used, for example, to generate specific anti-Guxl 
antibodies. Using known selection techniques, specific epitopes can be selected and used to generate 
%4 monoclonal or polyclonal antibodies. Such antibodies have utUlty in the assay of Guxl activity as well 
iff as in purifying recombinant Guxl polypeptides from genetically engineered host cells. 

Guxl Polynucleotides: 

The invention also provides polynucleotide molecules encoding the Guxl polypeptides discussed 
above. Guxl polynucleotide molecules of the invention include polynucleotide molecules having 

25 the nucleic acid sequence shown in Table 2 and SEQ ID N0:2, polynucleotide molecules that 
hybridize to the nucleic acid sequence of Table 2 and SEQ ID N0:2 under high stringency 
hybridization conditions (for example, 42**, 2.5 hr., 6X SCC, 0.1%SDS); and polynucleotide 
molecules having substantial nucleic acid sequence identity with the nucleic acid sequence of Table 
2 and SEQ ID N0:2, particularly with those nucleic acids encoding the catalytic domain, GH48 

30 (fi-om about amino acid N231 to about P870), the carbohydrate binding domain HI (fi-om about 
amino acid A3 5 to A 187) and carbohydrate binding domain n (fi-om about G1021 to about amino 
acid SI 121). 
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Table 2. Guxl nucleotide sequence (SEQ ID NO:2). 

ATGCCAGGATTACGACGGCGACTCCGCGCCGGTATCGTCTCGGCGGCGGCGTTGGGGTCGCTGGTTAGCGG 
GCTCGTTGCCGTCGCACCAGTCGCGCACGCGGCGGTGACTCTCAAAGCGCAGTATAAGAACAATGATTCGG 
5 CGCCGAGTGACAACCAGATCAAACCGGGTCTCCAGTTGGTGAATACCGGGTCGTCGTCGGTGGATTTGTCG 
ACGGTGACGGTGCGGTACTGGTTCACCCGGGATGGTGGGTCGTCGACACTGGTGTACAACTGTGACTGGGC 
GGCGATGGGGTGTGGGAATATCCGCGCCTCGTTCGGCTCGGTGAACCCGGCGACGCCGACGGCGGACACC 
TACCTGCAGTTGTCGTTCACTGGTGGAACGTTGGCCGCTGGTGGGTCGACGGGTGAGATTCAAAACCGGGT 
GAATAAGAGTGACTGGTCGAACTTTGATGAGACCAATGACTACTCGTATGGGACGAACACCACCTTCCAGG 

10 ACTGGACGAAGGTGACGGTGTACGTCAACGGCGTGTTGGTCTGGGGGACCGAACCGTCCGGAGCGACGGC 
GTCTCCATCCGCGTCGGCGACGCCCAGCCCGTCCAGTTCACCGACCACGAGTCCGAGTTCGTCCCCGTCGCC 
GAGCAGCAGCCCGACGCCGACACCGAGCAGCTCGTCGCCGCCCCGTCGTCCAACGACCCGTACATCCAGCG 
GTTCCTCACGATGTACAACAAGATTCACGACCCAGCGAACGGCTACITCAGCCCGCAGGGAATTCCCTACC 
ACrCGGTAGAAACGCTCATCGTTGAGGCACCGGACTACGGGCACGAGACAACTTCGGAGGCGTACAGCTTC 

15 TGGCTCTGGCTCGAAGCGACGTACGGCGCAGTGACCGGCAACTGGACGCCGTTCAACAACGCCTGGACGAC 
GATGGAAACGTACATGATCCCGCAGCACGCGGACCAGCCGAACAACGCGTCGTACAACCCCAACAGCCCG 
GCGTCGTACGCTCCGGAAGAGCCGCTGCCCAGCATGTACCCGGTTGCCATCGACAGCAGCGTGCCGGTTGG 

gcacgacccgcrcgccgccgaattgcagtcgacgtacggcacrccggacatttacggcatgcactggctgg 
ccgacgttgacaacatctacggatacggcgacagccccggcggtggttgcgaactcggtccttccgctaag 
20 ggcgtctcctacatcaacacattccagcgcggctcgcaggagtccgtctgggagacggtcacccagccgac 
gtgcgacaacggcaagtacggtggggcgcacggctacgtcgacctgttcatccagggttcgacgccgccgc 
agtggaagtacaccgatgccccggacgccgacgcccgtgccgtccaggctgcgtactgggcctacacctgg 
gcatcggcgcagggcaaggcaagcgcgattgccccgacgatcgccaaggcgagccaaaccggcgactacc 
)3 tgcggtacrcgct(mtgacaagta<ntcaagcaggtcggcaactgcracccggccagctcctgccct 

gcaaccggacgccagagcgagacctacctgatcggctggtactacgcctggggcggctcaagccaaggct 
h gggcctggcgcattggtgacggcgccgcgcacttcggctaccagaatccgcttgccgcgtgggcgatgtcg 
ly aacgtgacaccgctcattccgctctcgcccacggcaaagagcgactgggcggcgagcttgcagcgccagct 
w ggagttctaccagtggttgcaatccgcggaaggagccattgcgggcggcgccaccaacagctggaacggc 
,e aattacgggaccccgccggccggagactcgaccttctacggcatggcgtacgactgggagccggtctacca 
^ cgacccgccgagcaacaactggttcggcttccaggcgtggtccatggaacgggttgccgagtactactacg 
f3 tcaccggcgacccgaaggccaaggcgctgctcgacaagtgggtcgcatgggtgaagccgaatgtcaccac 
^li cggtgcctcatggtcgattccgtcgaatttgtcctggagcggccaaccggatacctggaatccgagcaacc 
?il caggaacgaatgccaacctgcacgtgaccatcacgtcgtccgggcaggacgtcggtgttgccgcggcgctc 
gcgaagacactcgagtactacgcggcaaaatccggcgatacggcctcgcgcgacctcgcgaagggattgc 
tcgactccatgtggaacaacgaccaggacagcctcggtgtgagcacaccggagacgcggaccgactactct 
h cggttcactcaggtgtacgacccgacgactggtgacggcctctacatcccgtcgggttggacggggaccat 
gcccaacggtgaccaaatcaagccgggtgcgaccttcctgagcatccggtcctggtacaccaaggatccgc 
agtggtcgaaggtgcaggcgtacctcaacggcgggcctgctccgacgttcaactaccaccggttctgggcg 
gagtccgacttcgcgatggcgaacgccgattttggcatgctcttcccatccgggtcgcccagcccgacccc 
40 gagcccgactccgacgtcgtccccgagcccgactccgagcagctcgccgacgccgtcgcccagcccgtcac 
cgaccggcgacaccacgccgccgagcgtgccgacgggtcttcaggtcaccgggacaacgacgtcgtccgtg 
tcgctcagctggaccgcgtccaccgacaacgtcggcgtcgcgcactacaacgtgtaccgaaacggcacgct 
ggtgggtcagccgacagcgacgtcgttcacggacaccggcctggctgctggcacgtcgtacacgtacacag 
tggcggccgttgatgcggccggtaacacgtcggcgcagagcttcgccggtgacagcgacgacggcatcgc 
45 cgtcgcgagcccgtcgccgagcccgactccgacgtcgtccccgagcccaacgccgtcgccgacaccgtcac 
cgacgtccaccagcggcgcatcgtgcactgctacctacgttgtcaatagcgactggggtagcggcttcacg 
acaaccgtgaccgtgacgaacaccggcaccagggccaccagtggctggacggtcacgtggagctttgccg 
gtaatcagacggtcaccaactactggaacaccgcgctgacgcaatccggaaagtcggtgaccgcaaagaa 
cctgagttacaacaacgtcatccaacctggtcagtcgacgacctttggattcaacggaagttactcaggaa 
50 caaacaccgcgccgacgctcagctgcacggcaagctga 



The Guxl polynucleotide molecules of the invention are preferably isolated molecules encoding the 
Guxl polypetide having an amino acid sequence as shown in Table 1 and SEQ ID N0:1, as well as 
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derivatives, variants, and useful fragments of the Guxl polynucleotide. The Guxl polynucleotide 
sequence can include deletions, substitutions, or additions to the nucleic acid sequence of Table 2 and 
SEQIDN0:2. 

5 The Guxl polynucleotide molecule of the invention can be cDNA, chemically synthesized DNA, DNA 
amplified by PGR, RNA, or combinations thereof Due to the degeneracy of the genetic code, two 
DNA sequences may differ and yet encode identical amino acid sequences. The present invention thus 
provides an isolated polynucleotide molecule having a Guxl nucleic acid sequence encoding Guxl 
polypeptide, where the nucleic acid sequenc encodes a polypeptide having the complete amino acid 

10 sequences as shown in Table 1 and SEQ ID N0:1 , or variants, derivatives, and fragments thereof 



CO 




25 



The Guxl polynucleotides of the invention have a nucleic acid sequence that is at least about 60% 
identical to the nucleic acid sequence shown in Table 2 and SEQ ID N0:2, in some embodiments at 
least about 70% identical to the nucleic acid sequence shown in Table 2 and SEQ ID N0;2, and in 
other embodiments at least about 90% identical to the nucleic acid sequence shovm in Table 2 and SEQ 
ID N0:2. Nucleic acid sequence identity is determined by known methods, for example by aligning 
two sequences in a software program such as the BLAST program (Altschul, S.F et al (1990) J. Mol. 
Biol. 215 :403-4 1 0, from the National Center for Biotechnology Information 

The Guxl polynucleotide molecules of the invention also include isolated polynucleotide molecules 
having a nucleic acid sequence that hybridizes \mder high stringency conditions (as defined above) to a 
the nucleic acid sequence shown in Table 2 and SEQ ID N0:2. Hybridization of the polynucleotide is 
to at least about 15 contiguous nucleotides, or at least about 20 contiguous nucleotides, and in other 
embodiments at least about 30 contiguous nucleotides, and in still other embodiments at least about 100 
contiguous nucleotides of the nucleic acid sequence shown in Table 2 and SEQ ID N0:2. 



Usefiil firagments of the Guxl -encoding polynucleotide molecules described herein, include probes and 
primers. Such probes and primers can be used, for example, in PGR methods to amplify and detect the 
30 presence of Guxl polynucleotides in vitro, as well as in Southem and Northem blots for analysis of 
Guxl. Cells expressing the Guxl polynucleotide molecules of the invention can also be identified by 
the use of such probes. Methods for the production and use of such primers and probes are known. For 
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PGR, 5' and 3' primers corresponding to a region at the termini of the Guxl polynucleotide molecule 
can be employed to isolate and amplify the Guxl polynucleotide using conventional techniques. 



Other useful fragments of the Guxl polynucleotides include antisense or sense oligonucleotides 
comprising a single-stranded nucleic acid sequence capable of binding to a target Guxl mRNA (using a 
sense strand), or DNA (using an antisense strand) sequence. 

Vectors and Host Cells: 

The present invention also provides vectors containing the polynucleotide molecules of the invention, 
as well as host cells transformed with such vectors. Any of the polynucleotide molecules of the 
invention may be contained in a vector, which generally includes a selectable marker and an origin of 
replication, for propagation in a host. The vectors further include suitable transcriptional or 
translational regulatory sequences, such as those derived from a mammalian, microbial, viral, or insect 
genes, operably linked to the Guxl polynucleotide molecule. Examples of such regulatory sequences 
include transcriptional promoters, operators, or enhancers, mRNA ribosomal binding sites, and 
appropriate sequences which control transcription and translation. Nucleotide sequences are operably 
linked when the regulatory sequence functionally relates to the DNA encoding the target protein. Thus, 
a promoter nucleotide sequence is operably linked to a Guxl DNA sequence if the promoter nucleotide 
sequence directs the transcription of the Guxl sequence. 

Selection of suitable vectors for the cloning of Guxl polynucleotide molecules encoding the target 
Guxl polypeptides of this invention will depend upon the host cell in which the vector will be 
transformed, and, where appUcable, the host cell from which the target polypeptide is to be expressed. 
Suitable host cells for expression of Guxl polypeptides include prokaryotes, yeast, and higher 
eukaryotic cells, each of which is discussed below. 



The Guxl polypeptides to be expressed in such host cells may also be fusion proteins that include 
regions from heterologous proteins. As discussed above, such regions may be included to allow, for 
example, secretion, improved stability, or facilitated purification of the Guxl polypeptide. For 
example, a nucleic acid sequence encoding an appropriate signal peptide can be incorporated into an 
expression vector. A nucleic acid sequence encoding a signal peptide (secretory leader) may be fused 
in-frame to the Guxl sequence so that Guxl is translated as a fusion protein comprising the signal 
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peptide. A signal peptide that is functional in the intended host cell promotes extracellular secretion of 
the Guxl polypeptide. Preferably, the signal sequence will be cleaved from the Guxl polypeptide upon 
secretion of Guxl from the cell. Non-limiting examples of signal sequences that can be used in 
practicing the invention include the yeast I-factor and the honeybee melatin leader in Sf9 insect cells. 

5 

Suitable host cells for expression of target polypeptides of the invention include prokaryotes, yeast, and 
higher eukaryotic cells. Suitable prokaryotic hosts to be used for the expression of these polypeptides 
include bacteria of the genera Escherichia, Bacillus, and Salmonella, as well as members of the genera 
Pseudomonas, Streptomyces, and Staphylococcus, For expression in prokaryotic cells^ for example, in 
10 E. coli, the polynucleotide molecule encoding Guxl polypeptide preferably includes an N-terminal 
methionine residue to facilitate expression of the recombinant polypeptide. The N-terminal Met may 
optionally be cleaved from the expressed polypeptide. 

C3 

^ Expression vectors for use in prokaryotic hosts generally comprise one or more phenotypic selectable 
marker genes. Such genes encode, for example, a protein that confers antibiotic resistance or that 
iU supplies an auxotrophic requirement. A wide variety of such vectors are readily available from 

m 

"p commercial sources. Examples include pSPORT vectors, pGEM vectors (Promega, Madison, WI), 
pPROEX vectors (LTI, Bethesda, MD), Bluescript vectors (Stratagene), and pQE vectors (Qiagen). 

rlJ 

Guxl can also be expressed in yeast host cells from genera including Saccharomyces, Pichia, and 
)^ Kluveromyces, Preferred yeast hosts are 5. cerevisiae and P. pastoris. Yeast vectors will often contain 
an origin of replication sequence from a 2T yeast plasmid, an autonomously replicating sequence 
(ARS), a promoter region, sequences for polyadenylation, sequences for transcription termination, and 
a selectable marker gene. Vectors replicable in both yeast and E, coli (termed shuttle vectors) may also 
25 be used. In addition to the above-mentioned features of yeast vectors, a shuttle vector will also include 
sequences for replication and selection in E, coli. Direct secretion of the target polypeptides expressed 
in yeast hosts may be accomplished by the inclusion of nucleotide sequence encoding the yeast I-factor 
leader sequence at the 5* end of the Guxl -encoding nucleotide sequence. 



30 Insect host cell culture systems can also be used for the expression of Guxl polypeptides. The target 
polypeptides of the invention are preferably expressed using a baculovirus expression system, as 
described, for example, in the review by Luckow and Summers, 1988 Bio/Technology 6:47. 
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The choice of a suitable expression vector for expression of Guxl polypeptides of the invention Avill 
depend upon the host cell to be used. Examples of suitable expression vectors for E. coli include pET, 
pUC, and similar vectors as is known in the art. Preferred vectors for expression of the Guxl 
5 polypeptides include the shuttle plasmid pU702 for Streptomyces lividans, pGAPZalpha-A, B, C and 
pPICZalpha-A, B, C (Invitrogen) for Pichia pastoris, and pFE-1 and pFE-2 for filamentous fimgi and 
similar vectors as is known in the art. 

Modification of a Guxl polynucleotide molecule to facilitate insertion into a particular vector (for 
example, by modifiying restriction sites), ease of use in a particular expression system or host (for 
example, using preferred host codons), and the like, are known and are contemplated for use in the 
invention. Genetic engineering methods for the production of Guxl polypeptides include the 
expression of the polynucleotide molecules in cell fi-ee expression systems, in cellular hosts, in 
tissues, and in animal models, according to known methods. 

Compositions 

The invention provides compositions containing a substantially purified Guxl polypeptide of the 
invention and an acceptable carrier. Such compositions are administered to biomass, for example, to 
degrade the cellulose in the biomass into simpler carbohydrate units and ultimately, to sugars. These 
released sugars fi-om the cellulose are converted into ethanol by any number of different catalysts. 
Such compositions may also be included in detergents for removal, for example, of cellulose 
containing stains within fabrics, or compositions used in the pulp and paper industry, to address 
conditions associated with cellulose content. Compositions of the present invention can be used in 
stonewashing jeans such as is well known in the art. Compositions can be used in the biopolishing 
of cellulosic fabrics, such as cotton, linen, rayon and Lyocell. 

The invention provides pharmaceutical compositions containing a substantially purified Guxl 
polypeptide of the invention and if necessary a pharmaceutically acceptable carrier. Such 
pharmaceutical compositions are administered to cells, tissues, or patients, for example, to aid in 
30 delivery or targeting of other pharmaceutical compositions. For example, Guxl polypeptides may be 
used where carbohydrate-mediated liposomal interactions are involved with target cells. Vyas SP et 
al. (2001), J. Pharmacy & Pharmaceutical Sciences May- Aug 4(2): 138-58. 
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The invention also provides reagents, compositions, and methods that are useful for analysis of Guxl 
activity and for the analysis of cellulose breakdown. 

5 Compositions of the present invention may also include other known cellulases, and preferably, other 
known thermal tolerant cellulases for enhanced treatment of cellulose. 

Antibodies 

The polypeptides of the present invention, in whole or in part, may be used to raise polyclonal and 
10 monoclonal antibodies that are useful in purifying Guxl, or detecting Guxl polypeptide expression, as 
well as a reagent tool for characterizing the molecular actions of the Guxl polypeptide. Preferably, a 
peptide containing a unique epitope of the Guxl polypeptide is used in preparation of antibodies, using 

p conventional techniques. Methods for the selection of peptide epitopes and production of antibodies 

1:=. 

^ are known. See, for example, Antibodies: A Laboratory Manual, Harlow and Land (eds.), 1988 Cold 

tier 

1^ Spring Harbor Laboratory Press, Cold Spring Harbor, N.Y.; Monoclonal Antibodies, Hybridomas: A 
I ^\ New Dimension in Biological Analyses, Kennet et al. (eds.), 1980 Plenum Press, New York. 

5 Assays 

%j Agents that modify, for example, increase or decrease, Guxl hydrolysis or degradation of cellulose 
Iff can be identified, for example, by assay of Guxl cellulase activity and/or analysis of Guxl bmdmg to 
Q a cellulose substrate. Incubation of cellulose in the presence of Guxl and in the presence or absence 
of a test agent and correlation of cellulase activity or carbohydrate binding permits screening of such 
agents. For example, cellulase activity and binding assays may be performed in a manner similar to 
those described in Irwin et al., J. Bacteriology 180(7): 1709-1714 (April 1998). 

25 

The Guxl stimulated activity is determined in the presence and absence of a test agent and then 
compared. A lower Guxl activated test activity in the presence of the test agent, than in the absence 
of the test agent, indicates that the test agent has decreased the activity of the Guxl. A higher Guxl 
activated test activity in the presence of the test agent than in the absence of the test agent indicates 
30 that the test agent has increased the activity of the Guxl . Stimulators and inhibitors of Guxl may be 
used to augment, inhibit, or modify Guxl mediated activity, and therefore may have potential 
industrial uses as well as potential use in the further elucidation of Guxl's molecular actions. 
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Therapeutic Applications 

The Guxl polypeptides of the invention are effective in adding in deHvery or targeting of other 
pharmaceutical compositions within a host. For example, Guxl polypeptides may be used where 
5 carbohydrate-mediated liposomal interactions are involved with target cells. Vyas SP et al. (2001), 
J. Pharm Pharm Sci May- Aug 4(2): 138-58. 

Guxl polynucleotides and polypeptides, including vectors expressing Guxl, of the invention can be 
formulated as pharmaceutical compositions and administered to a host, preferably mammalian host, 

10 including a human patient, in a variety of forms adapted to the chosen route of administration. The 
compoimds are preferably administered in combination with a pharmaceutically acceptable carrier, 
and may be combined with or conjugated to specific delivery agents, including targeting antibodies 

Q and/or cytokines. 

i"^ Guxl can be administered by known techniques, such as orally, parentally (including subcutaneous 

w 

id injection, intravenous, intramuscular, intrastemal or infusion techniques), by inhalation spray, 

m 

^z, topically, by absorption through a mucous membrane, or rectally, in dosage imit formulations 

spa 

= containing conventional non-toxic pharmaceutically acceptable carriers, adjuvants or vehicles. 

C3 

Pharmaceutical compositions of the invention can be in the form of suspensions or tablets suitable 
2£ for oral administration, nasal sprays, creams, sterile injectable preparations, such as sterile injectable 
O aqueous or oleagenous suspensions or suppositories. 

For oral administration as a suspension, the compositions can be prepared according to techniques 
well-known in the art of pharmaceutical formulation. The compositions can contain microcrystalline 
25 cellulose for imparting bulk, alginic acid or sodiiun alginate as a suspending agent, methylcellulose 
as a viscosity enhancer, and sweeteners or flavoring agents. As immediate release tablets, the 
compositions can contain microcrystalline cellulose, starch, magnesium stearate and lactose or other 
excipients, binders, extenders, disintegrants, diluents and lubricants known in the art. 

30 For administration by inhalation or aerosol, the compositions can be prepared according to 
techniques well-known in the art of pharmaceutical formulation. The compositions can be prepared 
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as solutions in saline, using benzyl alcohol or other suitable preservatives, absorption promoters to 
enhance bioavailability, fluorocarbons or other solubilizing or dispersing agents known in the art. 

For administration as injectable solutions or suspensions, the compositions can be formulated 
5 according to techniques well-known in the art, using suitable dispersing or wetting and suspending 
agents, such as sterile oils, including synthetic mono- or diglycerides, and fatty acids, including oleic 
acid. 

For rectal administration as suppositories, the compositions can be prepared by mixing with a 
10 suitable non-irritating excipient, such as cocoa butter, synthetic glyceride esters or polyethylene 
glycols, which are solid at ambient temperatures, but liquefy or dissolve in the rectal cavity to release 
the drug. 

fl 

■CCS' 

. B 

Preferred administration routes include orally, parenterally, as well as intravenous, intramuscular or 
1^ subcutaneous routes. More preferably, the compounds of the present invention are administered 
Id parenterally, i.e., intravenously or intraperitoneally, by infusion or injection. 

■PS 

Solutions or suspensions of the compounds can be prepared in water, isotonic saline (PBS) and 

3 ' 3 

'^j optionally mixed with a nontoxic surfactant. Dispersions may also be prepared in glycerol, liquid 
2^ polyethylene, glycols, DNA, vegetable oils, triacetin and mixtures thereof Under ordinary 
^'j conditions of storage and use, these preparations may contain a preservative to prevent the growth of 
microorganisms. 

The pharmaceutical dosage form suitable for injection or infusion use can include sterile, aqueous 
25 solutions or dispersions or sterile powders comprising an active ingredient which are adapted for the 
extemporaneous preparation of sterile injectable or infusible solutions or dispersions. In all cases, 
the ultimate dosage form should be sterile, fluid and stable under the conditions of manufacture and 
storage. The liquid carrier or vehicle can be a solvent or liquid dispersion mediimi comprising, for 
example, water, ethanol, a polyol such as glycerol, propylene glycol, or liquid polyethylene glycols 
30 and the like, vegetable oils, nontoxic glyceryl esters, and suitable mixtures thereof The proper 
fluidity can be maintained, for example, by the formation of liposomes, by the maintenance of the 
required particle size, in the case of dispersion, or by the use of nontoxic siwfactants. The prevention 
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of the action of microorganisms can be accomplished by various antibacterial and antifungal agents, 
for example, parabens, chlorobutanol, phenol, sorbic acid, thimerosal, and the like. In many cases, it 
will be desirable to include isotonic agents, for example, sugars, buffers, or sodium chloride. 
Prolonged absorption of the injectable compositions can be brought about by the inclusion in the 
composition of agents delaying absorption-for example, aluminum monosterate hydrogels and 
gelatin. 

Sterile injectable solutions are prepared by incorporating the compoimds in the required amount in 
the appropriate solvent with various other ingredients as enumerated above and, as required, 
followed by filter sterilization. In the case of sterile powders for the preparation of sterile injectable 
solutions, the preferred methods of preparation are vacuum drying and fireeze-drying techniques, 
which yield a powder of the active ingredient plus any additional desired ingredient present in the 
previously sterile-filtered solutions. 

Industrial Applications 

The Guxl polypeptides of the invention are effective cellulases. In the methods of the invention, the 
cellulose degrading effects of Guxl are achieved by treating biomass at a ratio of about 1 to about 50 
of Guxl:biomass. Gxixl may be used under extreme conditions, for example, elevated temperatures 
and acidic pH. Treated biomass is degraded into simpler forms of carbohydrates, and in some cases 
glucose, which is then used in the formation of ethanol or other industrial chemicals, as is known in 
the art. Other methods are envisioned to be within the scope of the present invention, including 
methods for treating fabrics to remove cellulose-containing stains and other methods already 
discussed. Guxl polypeptides can be used in any known application currently utilizing a cellulase, 
all of which are within the scope of the present invention. 

Having generally described the invention, the same will be more readily understood by reference to 
the following examples, which are provided by way of illustration and are not intended as limiting. 
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Example 1: Molecular Cloning of Guxl 

Genomic DNA was isolated from Acidothermus cellulolyticus and pxirified by banding on cesium 
5 chloride gradients. Genomic DNA was partially digested with Sau 3A and separated on agarose gels. 
DNA fragments in the range of 9-20 kilobase pairs were isolated from the gels. This purified Sau 3 A 
digested genomic DNA was ligated into the Bam HI acceptor site of purified EMBL3 lambda phage 
arms (Clontech, San Diego, Calif.). Phage DNA was packaged according to the manufacturer's 
specification and plated with E. Coli LE392 in top agar which contained the soluble cellulose analog, 
10 carboxymethylcellulose (CMC). The plates were incubated overnight (12-24 hours) to allow 

transfection, bacterial growth, and plaque formation. Plates were stained with Congo Red followed 
by destaining with 1 M NaCl. Lambda plaques harboring endoglucanase clones showed up as 
□ unstained plaques on a red background. 

S3. 

ff Lambda clones which screened positive on CMC-Congo Red plates were purified by successive 

H 

hi rounds of picking, plating and screening. Individual phage isolates were named SL-1, SL-2, SL-3, 

CO 

P and SL-4. Subsequent subcloning efforts employed the SL-3 clone which contained an 

^ approximately 14.2 kilobase fragment of Acidothermus cellulolyticus genomic DNA. 

P 

m Template DNA was constructed using a 9 kilobase Bam HI fragment obtained from the 14.2 
W kilobase lambda clone SL-3 prepared from Acidothermus cellulolyticus genomic DNA. The 9 

kilobase Bam HI fragment from SL-3 was subcloned into pDR540 to generate a plasmid NREL501. 
NREL501 was sequenced by the primer walking method as is known in the art. NREL501 was then 
subcloned into pUC19 using restriction enzymes Pst I and Eco RI and transformed into E. coli XLl- 
25 blue (Stratagene) for the production of template DNA for sequencing. Each subclone was sequenced 
from both the forward and reverse directions. DNA for sequencing was prepared from an overnight 
grov^h in 500 mL LB broth using a megaprep DNA purification kit from Promega. The templated 
DNA was PEG precipitated and suspended in de-ionized water and adjusted to a final concentration 
of 0.25 milligrams/mL. 

30 

Custom primers were designed by reading upstream known sequence and selecting segments of an 
appropriate length to fimction, as is well known in the art. Primers for cycle sequencing were 
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synthesized at the Macromolecular Resources Facility located at Colorado State University in Fort 
Collins , Colorado. Typically the sequencing primers were 26 to 30 nucleotides in length, but were 
sometimes longer or shorter to accommodate a melting temperature appropriate for cycle sequencing. 
The sequencing primers were diluted in de-ionized water, the concentration measured using UV 
5 absorbance at 260 nm, and then adjusted to a final concentration of 5 pmol/microL. 



Templates and sequencing primers were shipped to the Iowa State University DNA Sequencing 
Facility at Ames, Iowa for sequencing using standard chemistries for cycle sequencing. In some 
cases, regions of the template that sequenced poorly using the standard protocols and dye terminators 
10 were repeated with the addition of 2 microL DMSO and by using nucleotides optimized for the 
sequencing of high GC content DNA. 

O Sequencing data from primer walking and subclones were assembled together to verify that all SL-3 
.'^1 regions had been sequenced from both strands. An open reading frame (ORF) was found in the 9 
^ kilobase Bam HI fragment, C-terminal of El (U.S. Patent 5,536,655), termed Guxl. An ORF of 

Uj 3366 bp [SEQ ID N0:2] and deduced amino acid sequence [SEQ ID N0:1] are shown in Tables 1 

m 

> and 2. The amino acid sequence predicted by SEQ ID N0:1 was determined to have significant 
homology to known cellulases, as is shown below in Example 2 and Table 3. 

iJ 
ffi 

M The amino acid sequence represents a novel member of the family of proteins with cellulase activity. 
Q Due to the source of isolation, from the thermophilic Acidothermus cellulolyticus, Guxl is a novel 

member of cellulases with properties including thermal tolerance. It is also known that thermal 

tolerant enzymes may have other properties (see definition above). 

25 Example 2: Guxl includes a GH48 catalytic domain 

Sequence alignments and comparisons of the amino acid sequences of the Acidothermus 
cellulolyticus Guxl catalytic domain (approximately amino acids 231 to 870), Cellulomonas fimi 
(cellobiohydrolase B) and Thermobifida fusca (exocellulase E6) polypeptides were prepared, using 
the ClustalW program (Thompson J,D et al. (1994), Nucleic Acids Res, 22:4673-4680 from EMBL 
30 European Bioinformatics Institute website (http://www.ebi.ac.uk/)). An examination of the amino 
acid sequence alignment of the GH48 domains indicates that the amino acid sequence of Guxl 
catalytic domain is homologous to the amino acid sequences of known GH48 family catalytic 
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domains for C.fimi cellobiohydrolase B and Tfusca exocellulase E6 (see Table 3). In Table 3, the 
notations are as follows: an asterisk indicates identical or conserved residues in all sequences in 
the alignment; a colon indicates conserved substitutions; a period indicates semi-conserved 
substitutions; and a hyphen indicates a gap in the sequence. The amino acid sequence predicted 
for the Guxl GH48 domain is approximately 64 % identical to the C.fimi cellobiohydrolase B GH48 
domain and approximately 57 % identical to the Tfusca exocellulase E6 GH48 domain, indicating 
that the Guxl catalytic domain is a member of the GH48 family (Henrissat et al., (1991) supra). 
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Table 3. Multiplaamino acid sequence alignment of a Guxl first catalytic domain and 
polypeptides with Glycoside Hydrolase Family 48 catalytic domains. 

Multialignment of related Glycoside Hydrolase Family 48 catalytic domain 
5 GH48_Ace: Acidothermus ceiluIcJyticus Guxl catalytic domain GH48 

GuxB_Cfi: Cellulomonas fimi CBHB(beta-l,4-€xocellobiohydrolase). GeneBank Ace. # AAB00822 
E6_Tfu: Thermobifida fusca E6 (beb-1 ,4-exocellulase). GeneBank Acc. # AF 144563 

GH4 8_Ace PYIQRFLTN™kIHDPANGYFSPQG IPYHSVETLIVEAPDYGHETTSEAYSFWLWLEATY 

10 GuxB_Cf i EYAQRFLAQYBKIKDPANGYFSAQG- - - 1 PYHAVETLMVEAPDYGHETTSEAYSYWLWLEALY 

E6_Tfu SYDQAFLEQYEklKDPASGYFREFNGLLVPYHSVETMIVEAPDHGHQTTSEAFSYYLWLEAYY 
* * ** *.#^*. *** *** .***.***..*****.**.*****.*..***** * 

GH4 8_Ace gavtgnwtpfnnaVttmetymi pqhadqpnnasynpnspas yapeeplpsmypvaidssv 

15 GuxB_Cfi GQVTQDWAPLNHAWbTMEKYMIPQSVDQPTNSFYNPNSPATYAPEFNHPSSYPSQLNSGI 
E6_Tfu GRVTGDWKPLHDAW3SMETFIIPGTKDQPTNSAYNPNSPATYIPEQPNADGYPSPLMNNV 

* ★* .* *.. *★ \**.::** ***_*. *★★****.* ** ^ ^ -k* . ^ . 

GH48_Ace PVGHDPIJ^LQSTYGTf DIYGMHWI^VDNIYGYGDSPGGGCELGPSAKGVSYINTFQR 

20 GuxB_Cfi SGGTDPIGAELKATYGNMVYQMHWLADVDNIYGFGATPGAGCTLGPTATGTSFINTFQR 

E6_Tfu PVGQDPLAQELSSTYGTNfelYGMHWLLDVDNVYGFGFCGDG TDDAPAYINTYQR 

* V* ; , ..***.** 

GH48_Ace gsqeswetvtqptcdngkyVsahgyvdlfiqg-stppqwkytdapdadaravqaayway 
25 GuxB_Cfi gpqesvwetvpqpsceefkygskngyldlftkdasyakqwkytsasdadaraveavywan 
E6_Tf u garesvweti phpscddfthgotngyldlftddqnyakqwrytnapdadaravqvmfwah 

*_.******. .*.*.. .**Y**.*** ^ ^ ^ ^ **.******★*★★. ^ 

GH4 8_Ace twasaqgkasaiaptiakasqtgdylryslfdkyfkqvgncypasscpgatgrqsetyli 

GuxB_Cf i QWATEQGKAADVAATVAKAAKMGDY^YTLFDKYFKKIG- - CTSPTCAAGQGREAAHYLL 

E6_Tfu ewakeqgkeneiaglmdkaskmgdyl\yamfdkyfkkigncvgatscpggqgkdsah^ 

*** .* . **:: ****iSf . .******,. * *::: **: 

GH48_Ace GWYYAWGGS-- -SQGWAWRIGDGAAHFg\qNPLAAWAMSNVTPLIPLSPTAKSDWAASLQ 

GuxB_Cfi SVmiAWGGATDTSSGWAWRIGSSHAHFGYffiNPLAAWALSTDPKLTPKSPTAKADWAASMQ 
E6_Tfu SWYYSWGGSLDTSSAWAWRIGSSSSHQGY&VIAAYALSQVPELQPDSPTGVQDWATSFD 
*★ .***. *_**★*★*_ .* ★**« ***.★.* _ * * ***_ ***.*.. 

GH48_Ace RQLEFYQWLQSAEGAIAGGATNSWNGNYGTPHftGDSTFYGMAYDWEPVYHDPPSNNWFGF 

GuxB_Cfi RQLEFYTWLQASNGGIAGGATNSWDGAYAQPPAGTPTFYGMGYTEAPVYVDPPSNRWFGM 

E6_Tfu RQLEFLQWLQSAEGGIAGGATNSWKGSYDTPPTciLSQFYGMYYDWQPVWNDPPSNNWFGF 

* ***** ***...************ **. *\ **** * **. ***** ***. 

fSX .... . . y 

\ 

I GH48_Ace QAWSMERVAEYYYVTGDPKAKALLDKWVAWVKPNVTOG ASWSIPSNLSWSGQPDT 

45! GuxB_C f i QAWGVQRVAELYYASGNAQAKKI LDKWVPWWANI STDG ASWKVPS ELKWTGKPDT 

fU E6_Tfu QVWNMERVAQLYYVTGDARAEAILDKWVPWAIQHTDvi\ADNGGQNFQVPSDLEWSGQPDT 
rfs *.*.::***: **.:*:.:*: ;★*★★*. *^ : ..\ 

ly 

C3 GH4 8_Ace WNPSNPGTNANIJIVTITSSGQDVGVAAALAKTLEYYAAKgfeDTASRDLAKGLIJDSMWNND 
^ft GuxB_Cfi WNAAAPTGNPGLTVEVTSYGQDVGVAADTARALLFYAAKSQDTASRDKAKALLDAIWANN 
•'^ E6_Tfu WTG-TYTGNPNI^QWSYSQDVGVTAAIAKTLMYYAKRSGM'TAIiATAEGLIJ3A^ 

* *..* * :,* .*****;* *::* :** :** *\ : : *:.***:: : 

GH4 8_Ace QDSLGVSTPETRTDYSRFTQVYDPTTGDGLYIPSGWTGTMPNGDQIKPGATFLSIRSWYT 
55 GuxB_Cfi QDPLGVSAVETRGDYKRFDDTYVAN-GDGIYIPSGWTGTMPNGD^KPGVSFLDIRSFYK 
E6_Tfu -DSIGIATPEQ-PSWDRIiDDPWDGS--EGLYVPPGWSGTMPNGDR3f:PGATFLSIRSFYK 
*.:*::: * .:.*: : : .*.*.***.******* *\ * *. ;**,***.*, 

GH4 8_Ace KDPQWSKVQAYLNGG PAPTFNYHRFWAESDFAMANADFGMLFPSGS^P 

60 GuxB_Cf i KDPNWSKVQTFLDGG AEPQFRYHRFWAQTAVAGALADYARLFDDGT 

E6_Tfu NDPLWPQVEAHLNDPQNVPAPIVERHRFWAQVEIATAFAAHDELFGAGAP 
.** * .*.. *. * *****. * * * ** 



65 Example 3: Mixed Domain GH48, CBD II, CBD III Genes and Hybrid Polypeptides 

From the putative locations of the domains in the Guxl cellulase sequence given above and in 
comparable cloned cellulase sequences from other species, one can separate individual domains and 
combine them with one or more domains from different sequences. The significant similarity 
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between cellulase genes permit one by recombinant techniques to arrange one or more domains from 
the Acidothermus cellulolyticus Guxl cellulase gene with one or more domains from a cellulase gene 
from one or more other microorganisms. Other representative endoglucanase genes include Bacillus 
polymyxa beta-(l,4) endoglucanase (Baird et al, Journal of Bacteriology, 172: 1576-86 (1992)) and 
Xanthomonas campestris beta-(l,4)-endoglucanase A (Gough et al, Gene 89:53-59 (1990)). The 
result of the fiision of any two or more domains will, upon expression, be a hybrid polypeptide. 
Such hybrid polypeptides can have one or more catalytic or binding domains. For ease of 
manipulation, recombinant techniques may be employed such as the addition of restriction enzyme 
sites by site-specific mutagenesis. If one is not using one domain of a particular gene, any nimiber of 
any type of change including complete deletion may be made in the unused domain for convenience 
of manipulation. 

It is understood for pxuposes of this disclosure, that various changes and modifications may be made 
to the invention that are well within the scope of the invention. Numerous other changes may be 
made which will readily suggest themselves to those skilled in the art and which are encompassed in 
the spirit of the invention disclosed herein and as defined in the appended claims. 

This specification contains numerous citations to references such as patents, patent applications, and 
publications. Each is hereby incorporated by reference for all purposes. 
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